Hydrology#

In the following sections, we use Python to demonstrate how to gather data for a specific watershed in the HYDAT database

Environment setup#

[1]:
import xarray as xr
import intake
import hvplot.xarray

Accessing the data#

We are now ready to access our catalog which uses Intake to organize all our datasets.

[2]:
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat=intake.open_catalog(catalog_url)
cat
main:
  args:
    path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
  description: Master Data Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}

Let’s open ERA5-Land reanalysis (reference dataset) from which we will gather data at our watershed of interest

[3]:
ds_era5l = cat.atmosphere.era5_land_reanalysis.to_dask()
ds_era5l
[3]:
<xarray.Dataset>
Dimensions:    (latitude: 701, longitude: 1171, time: 636240)
Coordinates:
  * latitude   (latitude) float64 85.0 84.9 84.8 84.7 ... 15.3 15.2 15.1 15.0
  * longitude  (longitude) float64 -167.0 -166.9 -166.8 ... -50.2 -50.1 -50.0
  * time       (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
Data variables:
    sd         (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
    t2m        (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
    tp         (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>

Here, we open HYDAT dataset as a xarray Dataset :

[4]:
ds_hydat = cat.hydrology.hydat.to_dask()
ds_hydat
[4]:
<xarray.Dataset>
Dimensions:        (data_type: 2, id: 7881, latitude: 2800, longitude: 4680,
                    spatial_agg: 2, time: 59413, time_agg: 1, timestep: 1)
Coordinates: (12/13)
  * data_type      (data_type) <U5 'flow' 'level'
    drainage_area  (id) float64 dask.array<chunksize=(10,), meta=np.ndarray>
  * id             (id) <U7 '01AA002' '01AD001' ... '11AF004' '11AF005'
  * latitude       (latitude) float64 85.0 84.97 84.95 ... 15.07 15.05 15.02
  * longitude      (longitude) float64 -167.0 -167.0 -166.9 ... -50.05 -50.02
    name           (id) object dask.array<chunksize=(7881,), meta=np.ndarray>
    ...             ...
    regulated      (id) float64 dask.array<chunksize=(10,), meta=np.ndarray>
    source         (id) object dask.array<chunksize=(7881,), meta=np.ndarray>
  * spatial_agg    (spatial_agg) object 'point' 'watershed'
  * time           (time) datetime64[ns] 1860-01-01 1860-01-02 ... 2022-08-31
  * time_agg       (time_agg) <U4 'mean'
  * timestep       (timestep) <U3 'day'
Data variables:
    mask           (id, latitude, longitude) float64 dask.array<chunksize=(1, 500, 500), meta=np.ndarray>
    value          (id, time, data_type, spatial_agg, timestep, time_agg) float64 dask.array<chunksize=(10, 59413, 1, 1, 1, 1), meta=np.ndarray>

In the HYDAT dataset, we provide the mask variable which represents rasterized watershed deleniation at a 0.025 deg spatial resolution for most ids (stations) where streaflow is recorded. For instance, we can vizualize the mask for the 01AA002 station :

[5]:
%%time
station_id = '01AA002'

mask = ds_hydat.sel(id=station_id).mask

mask \
.where(mask==1, drop=True) \
.hvplot(geo=True, tiles='ESRI', width=750, height=500, colorbar=False, alpha=0.8)
CPU times: user 15.1 s, sys: 760 ms, total: 15.9 s
Wall time: 55 s
[5]:

We then interpolate our mask to fit the reference dataset’s spatial resolution :

[6]:
%%time
mask = mask.interp_like(ds_era5l)
mask = mask.where(mask==1, drop=True)

mask \
.hvplot(geo=True, tiles='ESRI', width=750, height=500, colorbar=False, alpha=0.8)
CPU times: user 13 s, sys: 521 ms, total: 13.5 s
Wall time: 51.6 s
[6]:

We can now apply the mask to the reference dataset and average the data on the mask :

[7]:
%%time
da = ds_era5l.t2m.where(mask==1, drop=True)
da
CPU times: user 12.9 s, sys: 285 ms, total: 13.1 s
Wall time: 47.3 s
[7]:
<xarray.DataArray 't2m' (time: 636240, latitude: 3, longitude: 3)>
dask.array<where, shape=(636240, 3, 3), dtype=float32, chunksize=(8760, 2, 3), chunktype=numpy.ndarray>
Coordinates:
  * latitude       (latitude) float64 46.6 46.5 46.4
  * longitude      (longitude) float64 -70.3 -70.2 -70.1
  * time           (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
    drainage_area  float64 dask.array<chunksize=(), meta=np.ndarray>
    id             <U7 '01AA002'
    name           object dask.array<chunksize=(), meta=np.ndarray>
    province       object dask.array<chunksize=(), meta=np.ndarray>
    regulated      float64 dask.array<chunksize=(), meta=np.ndarray>
    source         object dask.array<chunksize=(), meta=np.ndarray>
Attributes: (12/31)
    GRIB_NV:                                  0
    GRIB_Nx:                                  1171
    GRIB_Ny:                                  701
    GRIB_cfName:                              unknown
    GRIB_cfVarName:                           t2m
    GRIB_dataType:                            fc
    ...                                       ...
    GRIB_typeOfLevel:                         surface
    GRIB_units:                               K
    coordinates:                              number time step surface latitu...
    long_name:                                2 metre temperature
    standard_name:                            unknown
    units:                                    K
[8]:
%%time
da \
.mean(['latitude','longitude']) \
.hvplot(color='blue')
CPU times: user 4.69 s, sys: 461 ms, total: 5.15 s
Wall time: 24.4 s
[8]: